Aside

Download a PDF of this CV

Contact

Disclaimer

Last updated on 2020-12-16.

Main

Anderson Banihirwe

I contribute to and maintain several libraries within the open source scientific Python stack, particularly around improving scalability of Python tools in order to handle terabyte-scale datasets on HPC and cloud platforms.

Education

B.S., Computer Systems Engineering

University of Arkansas at Little Rock

Little Rock, AR

2018 - 2014

Professional Experience

Software Engineer ||

National Center for Atmospheric Research

Boulder, CO

present - 2020-10

  • Created jupyter-forward, a Jupyter Lab port forwarding utility that simplifies running jupyter on remote resources.
  • Served as a core developer of xarray, an open source library for working with multidimensional labeled datasets and arrays in Python.

Software Engineer |

National Center for Atmospheric Research

Boulder, CO

2020-9 - 2018-10

  • Led the intake-ESM project, a Python data cataloguing package for exploring and ingesting earth system model data sets.
  • Contributed to the core software stack powering the Pangeo Project. Some of the projects I contributed to include: xarray, dask.
  • Assisted with the development and deployment of live (virtual or in-person) and online/self-paced education material.

Software Developer Intern

Quansight

Austin, TX

2018-09 - 2018-05

  • Developed xndframes, a Pandas ExtensionDtype/Array backed by xnd, a container type that maps most Python values relevant for scientific computing directly to typed memory.
  • Worked on integrating cuDF - GPU dataframe library with Apache Arrow library.

Data Science Intern

First Orion

Little Rock, AR

2018-04 - 2017-11

  • Built scoring, predictive models with Scikit-learn, Dask, and Apache Spark using First Orion’s proprietary telecommunication data.

Research Intern

National Center for Atmospheric Research

Boulder, CO

2017-08 - 2017-05

  • Developed spark-xarray, a Python package that integrates PySpark and xarray for climate data analysis.

Selected Publications, Posters, and Talks

Cloud-Native Repositories for Big Scientific Data

Computing in Science and Engineering

N/A

2020-11

  • Authored with Ryan Abernathey, Tom Augspurger, et al.

Pangeo Benchmarking Analysis: Object Storage vs. POSIX File System

Fifth International Parallel Data Systems Workshop @ SC 20

N/A

2020-10

  • Authored with Haiying Xu, Kevin Paul.

The Pangeo Ecosystem: Interactive Computing Tools for the Geosciences: Benchmarking on HPC

2019 Supercomputing Conference Workshop on Interactive High-Performance Computing

N/A

2020-01

  • Authored with Tina Erica Odaka, Guillaume Eynard-Bontemps, Aurelien Ponte, Guillaume Maze, Kevin Paul, Jared Baker, Ryan Abernathey.

Zarr: chunked, compressed, multidimensional arrays

2020 Cloud Native Geospatial Outreach Day

Online

2020-09

  • Invited talk about Zarr, an open source data format for the storage of chunked, compressed, multidimensional arrays.

Intake-ESM – Making It Easier To Consume Climate and Weather Data

2020 ESIP Summer Meeting

Online

2020-07

  • Invited talk about intake-esm, an intake plugin for working with Earth System Model (ESM) datasets.

Perceptual Judgments to Detect Computer Generated Forged Faces in Social Media

IAPR Workshop on Multimodal Pattern Recognition of Social Signals in Human-Computer Interaction

N/A

2019-01

  • Authored with Suzan Anwar, Mariofanna Milanova, Mardin Anwer.

Interactive Supercomputing with Dask and Jupyter

2019 Scientific Computing with Python conference

Austin, TX

2019-07

  • Contributed talk about Dask and Jupyter.

Beyond Matplotlib - Tutorial: Building Interactive Climate Data Visualizations with Bokeh and Friends

2018 UCAR Software Engineering Assembly conference

Boulder, CO

2018-04

  • Contributed tutorial about interactive visualization with Python.

PySpark for “Big” Atmospheric Data Analysis

Eighth Symposium on Advances in Modeling and Analysis Using Python

Austin, TX

2018-01